Using sampling methods to improve binding site predictions

نویسندگان

  • Yi Sun
  • Mark Robinson
  • Rod Adams
  • I. René J. A. te Boekhorst
  • Alistair G. Rust
  • Neil Davey
چکیده

Currently the best algorithms for transcription factor binding site prediction are severely limited in accuracy. In previous work we combine random selection under-sampling into SMOTE over-sampling technique, working with several classification algorithms from machine learning field to integrate binding site predictions. In this paper, we improve the classification result with the aid of Tomek links as an either undersampling or cleaning technique.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using pre & post-processing methods to improve binding site predictions

Currently the best algorithms for transcription factor binding site prediction within sequences of regulatory DNA are severely limited in accuracy. In this paper, we integrate 12 original binding site prediction algorithms, and use a ‘window’ of consecutive predictions in order to contextualise the neighbouring results. We combine either random selection or Tomek links under-sampling with SMOTE...

متن کامل

Integrating binding site predictions using meta classification methods

Currently the best algorithms for transcription factor binding site prediction are severely limited in accuracy. There is good reason to believe that predictions from these different classes of algorithms could be used in conjunction to improve the quality of predictions. In this paper, we apply single layer networks and support vector machines on predictions from key algorithms. Furthermore, w...

متن کامل

Using Real-Valued Meta Classifiers to Integrate and Contextualize Binding Site Predictions

Currently the best algorithms for transcription factor binding site predictions are severely limited in accuracy. However, a non-linear combination of these algorithms could improve the quality of predictions. A support-vector machine was applied to combine the predictions of 12 key real valued algorithms. The data was divided into a training set and a test set, of which two were constructed: f...

متن کامل

GalaxySite: ligand-binding-site prediction by using molecular docking

Knowledge of ligand-binding sites of proteins provides invaluable information for functional studies, drug design and protein design. Recent progress in ligand-binding-site prediction methods has demonstrated that using information from similar proteins of known structures can improve predictions. The GalaxySite web server, freely accessible at http://galaxy.seoklab.org/site, combines such info...

متن کامل

Effect of Using Varying Negative Examples in Transcription Factor Binding Site Predictions

Background: Identifying transcription factor binding sites (TFBSs) computationally is a hard problem as it produces many false predictions. Combining the predictions from existing predictors can improve the overall predictions by using classification methods like Support Vector Machines (SVMs). But conventional negative examples (that is, example which is the part of non-binding sites) in this ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006